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About M 



• Co-founder of InfoGears 

• NYC via Montana and NJ. 

• Computer Science 

• Price comparison engine 

• @rusty_conover 
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The Problems 



• If the site's slow, users leave dissatisfied 
which often means lost sales. 

• Bandwidth is relatively expensive. 

• It's hard to anticipate needs. 

• Resources (time, money and people) are 
always limited. 



Other Concerns 



Thousands of people all want to watch 
your video at once. 

Viral campaigns 

The unexpected media mention. 
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The solutions 
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Example Providers: 
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Load balancer sends 
requests to a server it 
chooses based on a heuristic. 



All traffic goes through 
the load balancer. 
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Content is pushed 
to each server. 





Reverse Proxy 




Proxy server checks if request is 
cached, if not it pulls from the actual 
server and caches for future requests. 



Our Solution 




Request for Request for 

Static Content Dynamic Content 




Amazon's EC2 



Allows you to have as many virtual machines as 
you'd like. 

• Basic - 1.0 Ghz 2007 Xeon 1 .7 GB 
RAM, 160 GB storage. 

• Large - 8 CPUs, 1 5 GB Ram, 1 680 
GB storage, 64-bit platform. 



Bandwidth... 



• Amazon EC2's bandwidth is about 250 - 
1 000Mbps. 

• You're only billed for what you use. Making 
it cheap, but different than what youVe used 
to. 

• Amazon is located closer to peering 
points. There are East Coast and Europe 
DCs. 



Pricing 

It's all usage based, with discounts. 







$0.10 per GB (incoming) 


$0.03 per hour Small 


$O.I7perGB 
(first 10 TB outgoing) 


$0. 1 2 per hour Large 


Free between S3 and EC2 


Billed the entire time the 
machine is online. 



Limitations of EC2 

• When an instance is shutdown all of the 
disks are wiped. Cache is cleared. 

• There are no guarantees that a particular 
machine will remain up. 




So what's it good for? 



1 . Parallel processing tasks without building 
a server farm. 

2. Building cache servers to serve your 
content on quickly and cheaper than you 
can. 

3. Bragging about how your infrastructure 
is in "the cloud". 



How to scale 

Most websites have both static and dynamic 
content 

Serving separately will increase response 
time. 







Images, Videos, CSSJS 


ASP, PHP, Perl, CGI, HTML 


Files that don't change. 


Files that do change 


Larger 


Smaller 



Dynamic Proxy/Cache 



Static requests are only sent to the reverse 
proxy/cache. 



Redirects to the real server if there an error 

http://www.foo.com/movie.mpg 

to 

http://cache.foo.com/movie.mpg 



HTTP Requests 
pulled into the cache 
and streamed to client. 



Test that the cache is 
live and returning good results. 



Traffic Types 




Request address 
of cache. 



Dynamic Content 
Requests 



DNS Servers 
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Populate 
the active 
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for the cache 



Update 



HTTP Redirect 



A Little About EC2 



• Amazon provides a number of disk images, 
like ISOs for base installs. 

• Fedora Core & Windows. 

• You can customize your own install but 
start with something small. 



Amazon Images 



$ ec2-describe-images -o self -o amazon 

IMAGE ami-3c03e655 cache7 /image. manifest .xml 314456711494 
available private 

IMAGE ami-20b65349 ec2 -public-images /f edora-core4- 
base. manifest .xml amazon available public 



Create an instance 



Create an instance: 

$ ec2-run-instances ami-2103e648 
Showing current instances: 

$ ec2-describe-instances 

RESERVATION r-9a8076f3 314456711494 default 
INSTANCE i-4603fc2f ami-3c03e655 

ec2-72-44-35-86 • z-2 . compute- 1 . amazonaws .com 

domU-12-31-35-00-09-C2 . z-2 . compute-1 . internal 

running 0 ml. small 

2008-02-20T22 : 07 : 13+0000 
Stopping and cleaning up an instance: 
$ ec2 -terminate- instances i-4603f c2f 
INSTANCE i-4603fc2f terminated terminated 



DNS 



EC2 will go down, when you least expect it. 

You don't want the users to get errors and you 
don't want to be sending requests to a down 
server for very long. 

Use dynamic DNS updates and keep very short 
TTL times for the records. Or EC2's static 
addresses. 

Monitoring and DNS code needs to be reliable, 
use more then once separate network. 



DNS Redirection 



If you host more then one website, generally you 
don't want to setup instances for every domain. 

Setup one caching instance, and then create CNAME 
records for all of your other domains. 

For instance to cache requests at 
www.prolitegear.com I can use cache.prolitegear.com 
which is a CNAME for ccache.infogears.com. 



DNS Flowchart 



Request for 
cache.icebreaker.com 



Reply is CNAME 
ccache.infogears.com 
TTL:4Hrs 



Request for ccache.infogears.com 



Is Amazon 
server online? 



Reply is 4.4.4.4 
TTL: 10 seconds 



Reply is 7.7.7.7 
TTL: 1 0 seconds 



The Cache Stack 




Setup 

• You need to build in mod_cache, mod_proxy & 
mod_rewrite. 

• Keep the server as small as possible, no PHP or 
mod_perl. 

• You can set it up to use a memory or disk 
cache. 

. /configure — enable-cache — enable-mem-cache — enable-disk-cache — enable-proxy — enable-proxy- 
http — enable-status — enable-info — enable-rewrite — disable-proxy-f tp — disable-proxy-ajp — 
disable-proxy-balancer — enable-def late — disable-cgi — disable-cgid — disable-userdir — disable- 
alias — disable-cgid — disable-actions — disable-negotiation — disable-asis — disable-info — 
disable-f ilter — disable-static — enable-headers 





Lo 



in 



It's good to judge cache hits to make sure your cache is 
working. 



LogFormat "%{Host}i %h %1 %u %t \"%r\" %>s %b \"%{Referer}i\" \" 
{User-Agent }i\" %{Age}" proxy 



The Age header contains the age of the cached result in 
seconds, if not found it logs 



Logs should be sent back to reliable storage every so 
often. 




• Make sure you don't make an open proxy. 



• Our proxy requests will only be the result of rewrite 
rules. 




RewriteMap lowercase int : tolower 



RewriteMap cachehost txt : /usr/local/apache2/conf /cache-host .map 

RewriteCond $ {lowercase : %{SERVER_NAME} } "(.+)$ 

RewriteCond $ {cachehost : %1} ^(.+)$ 

RewriteRule V ( . *\ . (gif | jpg | jpeg | png) ) $ http://%l/$l [P,L,NC] 

RewriteRule A /$ http ; / /www . inf ogear s . com [R,L] 

The rewrite rule is what changes the cached url into the real url to 
pull for the request. 



The map file just lists the cache host name [TAB] destination host 
name. 




Host Mappin 



# Destination Host 



Source Host 



static-cache . gearbuyer . com static . gearbuyer . com 
images -cache • gearbuyer • com images . gearbuyer . com 
cache • gearbuyer . com www . gearbuyer . com 




Make sure that the cache root exists before Apache starts, otherwise it 
won't start, /mnt is a good place. 

Make sure you have the correct permissions so Apache can write to the 
directory. 



Change the directory levels and limits to suit your needs. 



Scalin 




ServerLimit 600 
StartServers 20 
MinSpareServers 20 
MaxSpareServers 60 
MaxClients 500 
MaxRequestsPerChild 0 



MaxKeepAliveRequests 1000 
KeepAlive On 
KeepAliveTimeout 10 
SendBuf ferSize 98303 



Since you're serving static requests it won't take much RAM to 
scale out more processes. 



Keep alive connections should persist as they prevent another 
TCP handshake. 



E£J Monitoring 

Monitoring is important to make 
sure that EC2 can reach your 
servers, and your EC2 server is still 
running. 

I use Perl for this since it has 
everything I need: a way to update 
DNS and a way to send web 
requests. 

SNMP Traffic Monitoring is also 
essential. 




Monitoring 



Do this forever 





Set address to 
failback server 



Set Amazon address 



Hit Rate 



Set Expires header on everything you 
can. 

ExpiresActive on 

ExpiresByType image/gif "access plus 8 hours" 

ExpiresByType image/png "access plus 8 hours" 

ExpiresByType image/ j peg "access plus 8 hours" 

ExpiresByType text /ess "access plus 8 hours" 

ExpiresByType application/x- javascript "access plus 8 hours" 

ExpiresByType application/x-shockwave-f lash "access plus 8 hours" 

ExpiresByType video/x-flv "access plus 8 hours" 

ExpiresByType application/pdf "access plus 8 hours" 




You can force refreshes by doing a reload, 
or using wget --no-cache 



SNMP Monitoring 




These graphs are generated by Cacti 
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Amazon Cloudfront 



• Price: $0.0 1 per I Ok req, $0. 1 7/gb traffic + 
S3 costs 

• Sept: $62/hits, $253.30/traffic = $316 not 
counting S3 costs. 

• Have to preload all resources to S3. Cache 
has about 2.2 million objects in 36 gigs. 



